Kandy
PL-Guard: Benchmarking Language Model Safety for Polish
Krasnodฤbska, Aleksandra, Seweryn, Karolina, ลukasik, Szymon, Kusa, Wojciech
Despite increasing efforts to ensure the safety of large language models (LLMs), most existing safety assessments and moderation tools remain heavily biased toward English and other high-resource languages, leaving majority of global languages underexamined. To address this gap, we introduce a manually annotated benchmark dataset for language model safety classification in Polish. We also create adversarially perturbed variants of these samples designed to challenge model robustness. We conduct a series of experiments to evaluate LLM-based and classifier-based models of varying sizes and architectures. Specifically, we fine-tune three models: Llama-Guard-3-8B, a HerBERT-based classifier (a Polish BERT derivative), and PLLuM, a Polish-adapted Llama-8B model. We train these models using different combinations of annotated data and evaluate their performance, comparing it against publicly available guard models. Results demonstrate that the HerBERT-based classifier achieves the highest overall performance, particularly under adversarial conditions.
Automatic feature selection and weighting in molecular systems using Differentiable Information Imbalance
Wild, Romina, Wodaczek, Felix, Del Tatto, Vittorio, Cheng, Bingqing, Laio, Alessandro
Feature selection is essential in the analysis of molecular systems and many other fields, but several uncertainties remain: What is the optimal number of features for a simplified, interpretable model that retains essential information? How should features with different units be aligned, and how should their relative importance be weighted? Here, we introduce the Differentiable Information Imbalance (DII), an automated method to rank information content between sets of features. Using distances in a ground truth feature space, DII identifies a low-dimensional subset of features that best preserves these relationships. Each feature is scaled by a weight, which is optimized by minimizing the DII through gradient descent. This allows simultaneously performing unit alignment and relative importance scaling, while preserving interpretability. DII can also produce sparse solutions and determine the optimal size of the reduced feature space. We demonstrate the usefulness of this approach on two benchmark molecular problems: (1) identifying collective variables that describe conformations of a biomolecule, and (2) selecting features for training a machine-learning force field. These results show the potential of DII in addressing feature selection challenges and optimizing dimensionality in various applications. The method is available in the Python library DADApy.
Graph-neural-network predictions of solid-state NMR parameters from spherical tensor decomposition
Mahmoud, Chiheb Ben, Rosset, Louise A. M., Yates, Jonathan R., Deringer, Volker L.
Nuclear magnetic resonance (NMR) is a powerful spectroscopic technique that is sensitive to the local atomic structure of matter. Computational predictions of NMR parameters can help to interpret experimental data and validate structural models, and machine learning (ML) has emerged as an efficient route to making such predictions. Here, we systematically study graph-neural-network approaches to representing and learning tensor quantities for solid-state NMR -- specifically, the anisotropic magnetic shielding and the electric field gradient. We assess how the numerical accuracy of different ML models translates into prediction quality for experimentally relevant NMR properties: chemical shifts, quadrupolar coupling constants, tensor orientations, and even static 1D spectra. We apply these ML models to a structurally diverse dataset of amorphous SiO$_2$ configurations, spanning a wide range of density and local order, to larger configurations beyond the reach of traditional first-principles methods, and to the dynamics of the $\alpha\unicode{x2013}\beta$ inversion in cristobalite. Our work marks a step toward streamlining ML-driven NMR predictions for both static and dynamic behavior of complex materials, and toward bridging the gap between first-principles modeling and real-world experimental data.
Creativity in AI: Progresses and Challenges
Ismayilzada, Mete, Paul, Debjit, Bosselut, Antoine, van der Plas, Lonneke
Creativity is the ability to produce novel, useful, and surprising ideas, and has been widely studied as a crucial aspect of human cognition. Machine creativity on the other hand has been a long-standing challenge. With the rise of advanced generative AI, there has been renewed interest and debate regarding AI's creative capabilities. Therefore, it is imperative to revisit the state of creativity in AI and identify key progresses and remaining challenges. In this work, we survey leading works studying the creative capabilities of AI systems, focusing on creative problem-solving, linguistic, artistic, and scientific creativity. Our review suggests that while the latest AI models are largely capable of producing linguistically and artistically creative outputs such as poems, images, and musical pieces, they struggle with tasks that require creative problem-solving, abstract thinking and compositionality and their generations suffer from a lack of diversity, originality, long-range incoherence and hallucinations. We also discuss key questions concerning copyright and authorship issues with generative models. Furthermore, we highlight the need for a comprehensive evaluation of creativity that is process-driven and considers several dimensions of creativity. Finally, we propose future research directions to improve the creativity of AI outputs, drawing inspiration from cognitive science and psychology.
Analysis of Convolutional Neural Network-based Image Classifications: A Multi-Featured Application for Rice Leaf Disease Prediction and Recommendations for Farmers
Paneru, Biplov, Paneru, Bishwash, Shah, Krishna Bikram
This study presents a novel method for improving rice disease classification using 8 different convolutional neural network (CNN) algorithms, which will further the field of precision agriculture. Tkinter-based application that offers farmers a feature-rich interface. With the help of this cutting-edge application, farmers will be able to make timely and well-informed decisions by enabling real-time disease prediction and providing personalized recommendations. Together with the user-friendly Tkinter interface, the smooth integration of cutting-edge CNN transfer learning algorithms-based technology that include ResNet-50, InceptionV3, VGG16, and MobileNetv2 with the UCI dataset represents a major advancement toward modernizing agricultural practices and guaranteeing sustainable crop management. Remarkable outcomes include 75% accuracy for ResNet-50, 90% accuracy for DenseNet121, 84% accuracy for VGG16, 95.83% accuracy for MobileNetV2, 91.61% accuracy for DenseNet169, and 86% accuracy for InceptionV3. These results give a concise summary of the models' capabilities, assisting researchers in choosing appropriate strategies for precise and successful rice crop disease identification. A severe overfitting has been seen on VGG19 with 70% accuracy and Nasnet with 80.02% accuracy. On Renset101, only an accuracy of 54% could be achieved, along with only 33% on efficientNetB0. A MobileNetV2-trained model was successfully deployed on a TKinter GUI application to make predictions using image or real-time video capture.
A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts
Bortolotti, Samuele, Marconato, Emanuele, Carraro, Tommaso, Morettin, Paolo, van Krieken, Emile, Vergari, Antonio, Teso, Stefano, Passerini, Andrea
The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https://unitn-sml.github.io/rsbench.